home *** CD-ROM | disk | FTP | other *** search
-
-
-
- AWK(1) AWK(1)
-
-
- NNAAMMEE
- awk - pattern-directed scanning and processing language
-
- SSYYNNOOPPSSIISS
- aawwkk [ --FF _f_s ] [ --vv _v_a_r_=_v_a_l_u_e ] [ _'_p_r_o_g_' | --ff _p_r_o_g_f_i_l_e ] [
- _f_i_l_e _._._. ]
-
- DDEESSCCRRIIPPTTIIOONN
- _A_w_k scans each input _f_i_l_e for lines that match any of a
- set of patterns specified literally in _p_r_o_g or in one or
- more files specified as --ff _p_r_o_g_f_i_l_e. With each pattern
- there can be an associated action that will be performed
- when a line of a _f_i_l_e matches the pattern. Each line is
- matched against the pattern portion of every pattern-
- action statement; the associated action is performed for
- each matched pattern. The file name means the standard
- input. Any _f_i_l_e of the form _v_a_r_=_v_a_l_u_e is treated as an
- assignment, not a filename, and is executed at the time it
- would have been opened if it were a filename. The option
- --vv followed by _v_a_r_=_v_a_l_u_e is an assignment to be done
- before _p_r_o_g is executed; any number of --vv options may be
- present. The --FF _f_s option defines the input field separa
- tor to be the regular expression _f_s_.
-
- An input line is normally made up of fields separated by
- white space. (This default can be changed by using the FS
- built-in variable or the --FF _f_s option.) The fields are
- denoted $$11, $$22, ..., while $$00 refers to the entire line.
-
- A pattern-action statement has the form
-
- _p_a_t_t_e_r_n {{ _a_c_t_i_o_n }}
-
- A missing {{ _a_c_t_i_o_n }} means print the line; a missing pat
- tern always matches. Pattern-action statements are sepa
- rated by newlines or semicolons.
-
- An action is a sequence of statements. A statement can be
- one of the following:
-
- if( _e_x_p_r_e_s_s_i_o_n ) _s_t_a_t_e_m_e_n_t [ else _s_t_a_t_e_m_e_n_t ]
- while( _e_x_p_r_e_s_s_i_o_n ) _s_t_a_t_e_m_e_n_t
- for( _e_x_p_r_e_s_s_i_o_n ; _e_x_p_r_e_s_s_i_o_n ; _e_x_p_r_e_s_s_i_o_n ) _s_t_a_t_e_m_e_n_t
- for( _v_a_r in _a_r_r_a_y ) _s_t_a_t_e_m_e_n_t
- do _s_t_a_t_e_m_e_n_t while( _e_x_p_r_e_s_s_i_o_n )
- break
- continue
- { [ _s_t_a_t_e_m_e_n_t _._._. ] }
- _e_x_p_r_e_s_s_i_o_n # commonly _v_a_r _= _e_x_p_r_e_s_s_i_o_n
- print [ _e_x_p_r_e_s_s_i_o_n_-_l_i_s_t ] [ > _e_x_p_r_e_s_s_i_o_n ]
- printf _f_o_r_m_a_t [ , _e_x_p_r_e_s_s_i_o_n_-_l_i_s_t ] [ > _e_x_p_r_e_s_s_i_o_n ]
- return [ _e_x_p_r_e_s_s_i_o_n ]
- next # skip remaining patterns on this input line
- delete _a_r_r_a_y[ _e_x_p_r_e_s_s_i_o_n ]# delete an array element
-
-
-
- 1
-
-
-
-
-
- AWK(1) AWK(1)
-
-
- exit [ _e_x_p_r_e_s_s_i_o_n ] # exit immediately; status is _e_x_p_r_e_s_s_i_o_n
-
- Statements are terminated by semicolons, newlines or right
- braces. An empty _e_x_p_r_e_s_s_i_o_n_-_l_i_s_t stands for $$00. String
- constants are quoted "" "", with the usual C escapes recog
- nized within. Expressions take on string or numeric val
- ues as appropriate, and are built using the operators ++ --
- ** // %% ^^ (exponentiation), and concatenation (indicated by
- a blank). The operators !! ++++ ---- ++== --== **== //== %%== ^^== >> >>== <<
- <<== ==== !!== ??:: are also available in expressions. Variables
- may be scalars, array elements (denoted _x[[_i]]_) or fields.
- Variables are initialized to the null string. Array sub
- scripts may be any string, not necessarily numeric; this
- allows for a form of associative memory. Multiple sub
- scripts such as [[ii,,jj,,kk]] are permitted; the constituents
- are concatenated, separated by the value of SSUUBBSSEEPP.
-
- The pprriinntt statement prints its arguments on the standard
- output (or on a file if >>_f_i_l_e or >>>>_f_i_l_e is present or on a
- pipe if ||_c_m_d is present), separated by the current output
- field separator, and terminated by the output record sepa
- rator. _f_i_l_e and _c_m_d may be literal names or parenthesized
- expressions; identical string values in different state
- ments denote the same open file. The pprriinnttff statement
- formats its expression list according to the format (see
- _p_r_i_n_t_f(3))_. The built-in function cclloossee((_e_x_p_r)) closes the
- file or pipe _e_x_p_r.
-
- The mathematical functions eexxpp, lloogg, ssqqrrtt, ssiinn, ccooss, and
- aattaann22 are built in. Other built-in functions:
-
- lleennggtthh the length of its argument taken as a string, or of
- $$00 if no argument.
-
- rraanndd random number on (0,1)
-
- ssrraanndd sets seed for rraanndd and returns the previous seed.
-
- iinntt truncates to an integer value
-
- ssuubbssttrr((_s,, _m,, _n))
- the _n-character substring of _s that begins at posi
- tion _m counted from 1.
-
- iinnddeexx((_s,, _t))
- the position in _s where the string _t occurs, or 0
- if it does not.
-
- mmaattcchh((_s,, _r))
- the position in _s where the regular expression _r
- occurs, or 0 if it does not. The variables RRSSTTAARRTT
- and RRLLEENNGGTTHH are set to the position and length of
- the matched string.
-
-
-
-
- 2
-
-
-
-
-
- AWK(1) AWK(1)
-
-
- sspplliitt((_s,, _a,, _f_s))
- splits the string _s into array elements _a[[11]]_, _a[[22]]_,
- ..., _a[[_n]]_, and returns _n. The separation is done
- with the regular expression _f_s or with the field
- separator FFSS if _f_s is not given.
-
- ssuubb((_r,, _t,, _s))
- substitutes _t for the first occurrence of the regu
- lar expression _r in the string _s. If _s is not
- given, $$00 is used.
-
- ggssuubb same as ssuubb except that all occurrences of the reg
- ular expression are replaced; ssuubb and ggssuubb return
- the number of replacements.
-
- sspprriinnttff((_f_m_t,, _e_x_p_r,, _._._. ))
- the string resulting from formatting _e_x_p_r _._._.
- according to the _p_r_i_n_t_f(3) format _f_m_t
-
- ssyysstteemm((_c_m_d))
- executes _c_m_d and returns its exit status
-
- The ``function'' ggeettlliinnee sets $$00 ttoo the next input record
- from the current input file; ggeettlliinnee <<_f_i_l_e sets $$00 to the
- next record from _f_i_l_e. ggeettlliinnee _x sets variable _x instead.
- Finally, _c_m_d || ggeettlliinnee pipes the output of _c_m_d into ggeett
- lliinnee; each call of ggeettlliinnee returns the next line of output
- from _c_m_d. In all cases, ggeettlliinnee returns 1 for a success
- ful input, 0 for end of file, and -1 for an error.
-
- Patterns are arbitrary Boolean combinations (with !! |||| &&&&)
- of regular expressions and relational expressions. Regu
- lar expressions are as in _e_g_r_e_p; see _g_r_e_p(1). Isolated
- regular expressions in a pattern apply to the entire line.
- Regular expressions may also occur in relational expres
- sions, using the operators ~~ and !!~~. //_r_e// is a constant
- regular expression; any string (constant or variable) may
- be used as a regular expression, except in the position of
- an isolated regular expression in a pattern.
-
- A pattern may consist of two patterns separated by a
- comma; in this case, the action is performed for all lines
- from an occurrence of the first pattern though an occur
- rence of the second.
-
- A relational expression is one of the following:
-
- _e_x_p_r_e_s_s_i_o_n _m_a_t_c_h_o_p _r_e_g_u_l_a_r_-_e_x_p_r_e_s_s_i_o_n
- _e_x_p_r_e_s_s_i_o_n _r_e_l_o_p _e_x_p_r_e_s_s_i_o_n
- _e_x_p_r_e_s_s_i_o_n iinn _a_r_r_a_y_-_n_a_m_e
- ((_e_x_p_r,,_e_x_p_r_,_._._.)) iinn _a_r_r_a_y_-_n_a_m_e
-
- where a relop is any of the six relational operators in C,
- and a matchop is either ~~ (matches) or !!~~ (does not
-
-
-
- 3
-
-
-
-
-
- AWK(1) AWK(1)
-
-
- match). A conditional is an arithmetic expression, a
- relational expression, or a Boolean combination of these.
-
- The special patterns BBEEGGIINN and EENNDD may be used to capture
- control before the first input line is read and after the
- last. BBEEGGIINN and EENNDD do not combine with other patterns.
-
- Variable names with special meanings:
-
- FFSS regular expression used to separate fields; also
- settable by option --FF_f_s_.
-
- NNFF number of fields in the current record
-
- NNRR ordinal number of the current record
-
- FFNNRR ordinal number of the current record in the current
- file
-
- FFIILLEENNAAMMEE
- the name of the current input file
-
- RRSS input record separator (default newline)
-
- OOFFSS output field separator (default blank)
-
- OORRSS output record separator (default newline)
-
- OOFFMMTT output format for numbers (default %%..66gg)
-
- SSUUBBSSEEPP separates multiple subscripts (default 034)
-
- AARRGGCC argument count, assignable
-
- AARRGGVV argument array, assignable; non-null members are
- taken as filenames
-
- EENNVVIIRROONN
- array of environment variables; subscripts are
- names.
-
- Functions may be defined (at the position of a pattern-
- action statement) thus:
-
- function foo(a, b, c) { ...; return x }
-
- Parameters are passed by value if scalar and by reference
- if array name; functions may be called recursively.
- Parameters are local to the function; all other variables
- are global. Thus local variables may be created by pro
- viding excess parameters in the function definition.
-
- EEXXAAMMPPLLEESS
-
-
-
-
- 4
-
-
-
-
-
- AWK(1) AWK(1)
-
-
- length > 72
- Print lines longer than 72 characters.
-
- { print $2, $1 }
- Print first two fields in opposite order.
-
- BEGIN { FS = ",[ \t]*|[ \t]+" }
- { print $2, $1 }
-
- Same, with input fields separated by comma and/or
- blanks and tabs.
-
- { s += $1 }
- END { print "sum is", s, " average is", s/NR }
- Add up first column, print sum and average.
-
- /start/, /stop/
- Print all lines between start/stop pairs.
-
- BEGIN { # Simulate echo(1)
- for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
- printf "\n"
- exit }
-
- SSEEEE AALLSSOO
- _l_e_x(1), _s_e_d(1)
- A. V. Aho, B. W. Kernighan, P. J. Weinberger, _T_h_e _A_W_K _P_r_o_
- _g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e_, Addison-Wesley, 1988.
-
- BBUUGGSS
- There are no explicit conversions between numbers and
- strings. To force an expression to be treated as a number
- add 0 to it; to force it to be treated as a string con
- catenate """" to it.
- The scope rules for variables in functions are a botch;
- the syntax is worse.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 5
-
-
-